10 research outputs found

    Classification of Resilience Techniques Against Functional Errors at Higher Abstraction Layers of Digital Systems

    Get PDF
    Nanoscale technology nodes bring reliability concerns back to the center stage of digital system design. A systematic classification of approaches that increase system resilience in the presence of functional hardware (HW)-induced errors is presented, dealing with higher system abstractions, such as the (micro) architecture, the mapping, and platform software (SW). The field is surveyed in a systematic way based on nonoverlapping categories, which add insight into the ongoing work by exposing similarities and differences. HW and SW solutions are discussed in a similar fashion so that interrelationships become apparent. The presented categories are illustrated by representative literature examples to illustrate their properties. Moreover, it is demonstrated how hybrid schemes can be decomposed into their primitive components

    Stochastic Approaches for Speeding-Up the Analysis of the Propagation of Hardware-Induced Errors and Characterization of System-Level Mitigation Schemes in Digital Communication Systems

    No full text
    Today's nano-scale technology nodes are bringing reliability concerns back to the center stage of digital system design because of issues, like process variability, noise effects, radiation particles, as well as increasing variability at run time. Alleviations of these effects can become potentially very costly and the benefits of technology scaling can be significantly reduced or even lost. In order to build more robust digital systems, initially, their behavior in the presence of hardware-induced bit errors must be analyzed. In many systems, certain types of errors can be tolerated. These cases can be revealed through such an analysis. Overhead can be avoided and remedy measures can be applied only when needed. Communication systems are an interesting domain for such explorations: First, they have high societal relevance due to their ubiquity. Second, they can potentially tolerate hardware-induced errors due to their built-in redundancy present to cope with channel noise. This work focuses on analyzing the impact of such errors on the behavior of communication systems. Typically, error propagation studies are performed through time-consuming fault injection campaigns. These approaches do not scale well with growing system sizes. Stochastic experiments allow a more time-efficient approach. On top, breaking down the system into subsystems and propagating error statistics through each of these subsystems further improves the speed-up and flexibility in the reliability evaluation of complex systems. As an initial step in this thesis, statistical moments are propagated through the signal flows of Linear-Time-Invariant (LTI) blocks. Such a scheme, although fast, can only be applied in the case that the signal lacks autocorrelation. However, autocorrelation can be introduced in the signal due to various reasons, like by signal processing blocks. In that case, other approaches are available to reduce the computational cost of the necessary (repetitive) experiments, like the Principal Component Analysis (PCA). Benefits of such a technique depend on several parameters and, therefore, a more broadly usable technique is required. To address this need, a framework is proposed that exploits the repetitive nature of fault injection experiments for speed-up in LTI blocks. Two cases are distinguished: One, in which all operators of the LTI block act in a linear time-invariant way, and one, in which non-linear operations due to finite wordlengths are present. To complement the subject matter, the broad range of hardware-based mitigation techniques at the higher system level are explored and characterized. In this way, the main properties of each mitigation category are identified and, therefore, suitable choices can be made according to the application needs

    On the use of analytical techniques for reliability analysis in presence of hardware-induced errors

    No full text

    Sub-word handling in data-parallel mapping

    No full text
    status: publishe

    Paranthèses et processus de configuration thématique : vers une redéfinition de la notion de topic

    No full text
    Abstract. Several techniques aiming to improve power-efficiency (measured as EDP) in out-of-order cores trade energy with performance. Prime examples are the techniques to resize the instruction queue (IQ). While most of them produce good results, they fail to take into account that changing the timing of memory accesses can have significant consequences on the memory-level parallelism (MLP) of the application and thus incur disproportional performance degradation. We propose a novel mechanism that deals with this realization by collecting fine-grain information about the maximum IQ resizing that does not affect the MLP of the program. This information is used to override the resizing enforced by feedback mechanisms when this resizing might reduce MLP. We compare our technique to a previously proposed non-MLP-aware management technique and our results show a significant increase in EDP savings for most benchmarks of the SPEC2000 suite.

    Simulating the Cortical Microcircuit Significantly Faster Than Real Time on the IBM INC-3000 Neural Supercomputer

    No full text
    This article employs the new IBM INC-3000 prototype FPGA-based neural supercomputer to implement a widely used model of the cortical microcircuit. With approximately 80,000 neurons and 300 Million synapses this model has become a benchmark network for comparing simulation architectures with regard to performance. To the best of our knowledge, the achieved speed-up factor is 2.4 times larger than the highest speed-up factor reported in the literature and four times larger than biological real time demonstrating the potential of FPGA systems for neural modeling. The work was performed at Jülich Research Centre in Germany and the INC-3000 was built at the IBM Almaden Research Center in San Jose, CA, United States. For the simulation of the microcircuit only the programmable logic part of the FPGA nodes are used. All arithmetic is implemented with single-floating point precision. The original microcircuit network with linear LIF neurons and current-based exponential-decay-, alpha-function- as well as beta-function-shaped synapses was simulated using exact exponential integration as ODE solver method. In order to demonstrate the flexibility of the approach, additionally networks with non-linear neuron models (AdEx, Izhikevich) and conductance-based synapses were simulated, applying Runge–Kutta and Parker–Sochacki solver methods. In all cases, the simulation-time speed-up factor did not decrease by more than a very few percent. It finally turns out that the speed-up factor is essentially limited by the latency of the INC-3000 communication system

    Classification Framework for Analysis and Modeling of Physically Induced Reliability Violations

    Get PDF
    Technology downscaling is expected to amplify a variety of reliability concerns in future digital systems. A good understanding of reliability threats is crucial for the creation of efficient mitigation techniques. This survey performs a systematic classification of the state of the art on the analysis and modeling of such threats, which are caused by physical mechanisms to digital systems. The purpose of this article is to provide a classification tool that can aid with the navigation across the entire landscape of reliability analysis and modeling. A classification framework is constructed in a top-down fashion from complementary categories, each one addressing an approach on reliability analysis and modeling. In comparison to other classifications, the proposed methodology approaches the target research domain in a complete way, without suppressing hybrid works that fall under multiple categories. To substantiate the usability of the classification framework, representative works from the state of the art are mapped to each appropriate category and are briefly analyzed. Thus, research trends and opportunities for novel approaches can be identified
    corecore